Goto

Collaborating Authors

 vector function


Tensor-Based Foundations of Ordinary Least Squares and Neural Network Regression Models

arXiv.org Artificial Intelligence

This article introduces a novel approach to the mathematical development of Ordinary Least Squares and Neural Network regression models, diverging from traditional methods in current Machine Learning literature. By leveraging Tensor Analysis and fundamental matrix computations, the theoretical foundations of both models are meticulously detailed and extended to their complete algorithmic forms. The study culminates in the presentation of three algorithms, including a streamlined version of the Backpropagation Algorithm for Neural Networks, illustrating the benefits of this new mathematical approach. The following sections of this article require some important mathematical concepts and notations that need to be addressed here in this section. However, we assume the reader to be proficient on the basic and intermediate topics of Linear Algebra and Calculus, which will not be covered.


Robust Gaussian Process Regression with Huber Likelihood

arXiv.org Machine Learning

Gaussian process regression in its most simplified form assumes normal homoscedastic noise and utilizes analytically tractable mean and covariance functions of predictive posterior distribution using Gaussian conditioning. Its hyperparameters are estimated by maximizing the evidence, commonly known as type II maximum likelihood estimation. Unfortunately, Bayesian inference based on Gaussian likelihood is not robust to outliers, which are often present in the observational training data sets. To overcome this problem, we propose a robust process model in the Gaussian process framework with the likelihood of observed data expressed as the Huber probability distribution. The proposed model employs weights based on projection statistics to scale residuals and bound the influence of vertical outliers and bad leverage points on the latent functions estimates while exhibiting a high statistical efficiency at the Gaussian and thick tailed noise distributions. The proposed method is demonstrated by two real world problems and two numerical examples using datasets with additive errors following thick tailed distributions such as Students t, Laplace, and Cauchy distribution.


Softmax intuition

#artificialintelligence

Consider a vector, for example (5, -0.5, 3, -2). We want to find a transformation such that the transformed vector represents a distribution: each component is between 0 and 1 and the components add up to 1 (so that they can be interpreted as probabilities). Each probability should reflect the original value in magnitude (for example, 5 should be associated with the highest probability). An easy solution is found in two steps. We will try to transform the original vector into a vector with positive components and these new components will reflect the dimensions of the original ones. To do this, we have to find a function f to transform components.


A Gentle Introduction To Vector Valued Functions

#artificialintelligence

Vector valued functions are often encountered in machine learning, computer graphics and computer vision algorithms. They are particularly useful for defining the parametric equations of space curves. It is important to gain a basic understanding of vector valued functions to grasp more complex concepts. In this tutorial, you will discover what vector valued functions are, how to define them and some examples. A gentle iIntroduction to vector valued functions.


Linear Functions of vector arguments

#artificialintelligence

Such functions can be regarded as vector functions. Linear vector functions, also known as linear operators, are of great importance in linear algebra and it's applications. We can write down the general representation of a linear form L(x) defined on an n-dimensional space K_n. Let e1, e2, โ€ฆ, en be and arbitrary basis of the space know, and denote the quantity L(e_k) by l_k (k 1,2,โ€ฆ,n). A morphism A A(x) of a linear space X is another linear space Y over the same filed k.


Exploring Sentence Vector Spaces through Automatic Summarization

arXiv.org Machine Learning

Even so, Table I suggests that the performance of the greedy algorithm is not based on the accuracy of the corresponding objective function. In particular, consider the two other strategies which try to maximize the same objective function: Brute force, and Maximum Similarity (which simply selects Greedy or Brute Force based on which one creates a summary with a higher cosine similarity). Brute Force consistently and significantly creates summaries with higher cosine similarity to the document, outperforming the Greedy selector on its objective function. By construction, the Max Similarity algorithm outperforms in cosine similarity to an even greater degree. But both of these algorithms perform much worse than the Greedy algorithm. Deeper analysis into the decisions of the Greedy algorithm reveals some reasons for this discrepancy. It appears that the good performance of the Greedy algorithm results not from the associated objective function, but by the way in which it maximizes this objective function. In particular, the Greedy algorithm selects sentences with low cosine similarity scores in a vacuum, but which increase the cosine similarity of the overall sentence (Figure 1). To understand why this is true, we consider the stepby-step behavior of the Greedy algorithm.


Binary classification of multi-channel EEG records based on the $\epsilon$-complexity of continuous vector functions

arXiv.org Machine Learning

A methodology for binary classification of EEG records which correspond to different mental states is proposed. This model-free methodology is based on our theory of the $\epsilon$-complexity of continuous functions which is extended here (see Appendix) to the case of vector functions. This extension permits us to handle multichannel EEG recordings. The essence of the methodology is to use the $\epsilon$-complexity coefficients as features to classify (using well known classifiers) different types of vector functions representing EEG-records corresponding to different types of mental states. We apply our methodology to the problem of classification of multichannel EEG-records related to a group of healthy adolescents and a group of adolescents with schizophrenia. We found that our methodology permits accurate classification of the data in the four-dimensional feather space of the $\epsilon$-complexity coefficients.